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ABSTRACT 



A Dynamic Rate Control (DRC) scheduler for scheduling 
cells for service in a generic Asynchronous Transfer Mode 
(ATM) switch is disclosed. According to the inventive DRC, 
each traffic stream associated with an internal switch queue 
is rate-shaped according to a rate which consists of a 
minimum guaranteed rate and a dynamic component com- 
puted based on congestion information within the switch. 
While achieving high utilization, DRC guarantees a mini- 
mum throughput for each stream and fairly distributes 
unused bandwidth. The distribution of unused bandwidth in 
DRC can be assigned flexibly, i.e., the unused bandwidth 
need not be shared in proportion to the minimum throughput 
guarantees, as in weighted fair share schedulers. Moreover, 
an effective closed-loop QoS control can be built into DRC 
by dynamically updating a set of weights based on observed 
QoS. Another salient feature of DRC is its ability to control 
congestion internal congestion at bottleneck points within a 
multistage switch. DRC can also be extended beyond the 
local switch in a hop -by-hop fashion. 

15 Claims, 6 Drawing Sheets 
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DYNAMIC RATE CONTROL SCHEDULER If the rates, M,-, have been computed correctly based on 

FOR ATM NETWORKS me stream traffic characteristics and QoS requirements, the 

minimum rate scheduler should succeed in guaranteeing 
This Application relates to U.S. application Ser. No. QoS for all of the streams. However, because this scheduler 
08/923,978 now U.S. Pat. Nos. 6,324,165 and Ser. No. 5 is non -work-conserving with respect to the common queue, 
09/040,311, which are incorporated herein by reference. bandwidth could be wasted for one of two reasons: 

The CAC algorithm was optimistic in its computation of 
BACKGROUND OF THE INVENTION m,-. T t may be the case that a bandwidth of M ; +A over 

1. Field of the Invention short intervals of time is required to ensure that QoS is 
The present invention relates to a control scheduler for an 30 met for stream L 

ATM network and, more specifically, to a scheduler which "The traffic stream could include low priority cells, with 

guarantees minimum rate of transmission, while fairly dis- me cel1 loss priority (CLP) bit set to one. 

tributes any unused bandwidth * n me nrst case > a stream should be allowed to make use of 

2. Description of Related Art bandwidth _beyond its allocated rate M,, if the bandwidth is 
TT . , , 4 , , , iL . , ™ 35 available. In the second case, the QoS guarantee applies only 

c H ^ b ; SP A C ^ Ct * ?° As y achrono ^ Tra T to cells that conform to the negotiated traffic contract, i.e., 

fer Mode (ATM) are expected to carry services with a wide „ ... t1 . ... /rT ° m . . 1T 

r v „ \ .... j J lv r . o\ cells with cell loss priority (CLP) set to zero. However, it 

range of traffic characteristics and quauty-oi-service (QoS) . , ., ul . . . ... . , 

& . A „ . j . . v „ / bandwidth is available, a stream should be permitted to 

requirements. For example, m audio transmission a cell is . „ * OT „ „ 

^, , . .r ' . , , , . , . .~ , transmit nonconforming cells, i.e., cells tagged as CLP=1 

useless to the receiver if it is delayed beyond the specified t1 . . * . . . . t e m n rx 

rt it it a . . j . J . . . . 20 cells, over and above the allocated minimum rate for CLP=0 

rate. On the other hand, video transmission is very bursty u Tf . j • i«. • » ■! . i ^ T n 1 « u i j u 

. . . A A J J cells. If bandwidth is not available, CLP=1 cells should be 

and, unless shaped at the entry point, may cause temporary , jur r-r n n n !u u u u i 

T . /, , „ „ , / ii - • dropped before CLP=0 cells; i.e., there should be a lower 

congestion and delay other cells. Integrating all services in a. I 1 1 r j • m n i n / a • i • *u _* 

& i . -r 4 * l ■ . threshold for dropping CLP=l cells. (As is known m the art, 

one network with a uniform transport mechanism can poten- , * •* 4 , . V * J 

, A . r - . r . when a source transmits at a rate higher than the negotiated 

Ually simphfy network operation and improve network , . . , , . ° ... • ^ T r »* *\ 

a- 7 . j -m i- *u . 1 u 25 rate, its violating cells are tagged by setting their CLP to 1.) 

efficiency. In order to realize these potential benefits, an ' . & , , ... zz • • * L j i • 

r 11 . • A_ ^-i Clearly, the problem with the minimum rate scheduler, is 

efficient and fair means or allocating the network resources . . r . - , . 

b that streams cannot make use of excess bandwidth even 

is essential * * • * ■ * * 

when it is available. In minimum rate scheduling, there is no 

A central problem m allocating the network resources is statistical multiplexing among cells belonging to different 

the manner in which the service to the various users is 3Q streams (As ^ taQfm m the ^ statistica i multiplexing 

pnontized. A simple model is to use a First In Fust Out take& ^ accoum u economies of ^ . . lhe bandwidth 

(FIFO) algorithm. In a simple First-In First-Out (FIFO) it lakcs tQ transmit ^ ^ streams t mer is k(B thc 

scheduler, there is no way of guaranteeing that each stream sum of ±t mdividual bandwidths required to transmit each 

gets its assigned rate. Dunng some interval of tune, a given stream } A gi k tQ cnhance ^ schcmc ^ tQ ^ 

stream may transmit at a rate higher than its assigned rate 35 means fof servin a ^ from a non t y queue whenever 

M t , and thereby steal bandwidth from other streams which bandwidth is availablc> Durin a ^ ^ if thc common 

are transmitUng at or below their assigned rates. T^is fe ^ services a ceU from one of the 

problem led to the development of various mechanisms for nm . cmpiy strea m queues. 

shaping the entry to the network such as the known leaky According to another prior art method, the queue selec- 

bucket algorithm. For example, the output stream for each 4Q ^ is done in a round . robin fashion> and the excess 

queue can be peak rate shaped to a predetermined rate M,. bandwidth £ s shared among the active streams . A 

FIG. 1 shows a static rate control (SRC) scheduler with disadvantage of such a scheduler is that queues are served 

N-stream queues, SQ1, SQ2 . . . SQN, one queue corre- without regard to QoS. That is, the bandwidth is alternated 

spondmg to each stream. The SRC scheduler serves a queue sequentially to the queues without regard to the urgency of 

i at the constant rate M, and the output cell streams are fed 45 transmission, i.e., requested minimum rate, of any specific 

to a common bottleneck queue CQ which is served at a given source There f ore) lh is met hod does not lend itself well for 

rate C Service from the common queue CQ corresponds to servm g different classes having different QoS requirements, 

cell transmission over a link of capacity C. Accordingly, there has been considerable interest in 

Rate -shaping transforms the streams into constant rate . packet scheduling algorithms which are intended to provide 

streams (assuming all queues are continuously backlogged). 50 weighted shares of the bandwidth on a common link to 

Considering the relationship competing traffic streams, so as to enable service of different 

classes. With slightly more complexity, the excess band- 

" 0) width can be shared using Weighted Round -Robin (WRR) 

Zj m ' * a Weighted Fair Queuing (WFQ), and Virtual Clock, and their 

55 variants, which attempt to approximate the idealized Gen- 
eralized Processor Sharing (GPS) scheduling, assuming a 

(to be developed further below), the bottleneck queue will be fluid model of traffic. For WRR, see M Katevenis, S. 

stable; in fact, the maximum queue length is N. In fact, strict Sidiropoulos, and C. Courcoubetis, Weighted Round-Robin 

inequality in (1) will usually hold, implying that the cell Cell Multiplexing in a General-Purpose ATM Switch, IEEE 

delay in the common queue will be small with high prob- 60 JSAC, Vol 9, pp. 1265-1279, October 1991. For WFQ, see, 

ability. Although the service discipline depicted in FIG. 1 is A. K. Parekh and R. G. Gallager, A generalized Processor 

work-conserving with respect to the stream queues, it is Sharing Approach to Flow Control in Integrated Service 

non- work-conserving with respect to the common queue, Networks: The Single-Node Case, IEEE/ACM Trans, on 

since it is possible that the common queue may go empty Networking, vol. 1, pp. 344-357, June 1993. For Virtual 

even when at least one of the stream queues is non-empty. 65 Clock, see, L. Zhang, Virtual Clock: A New Traffic Control 

This scheduler is similar to a circuit -switched system, except Algorithm for Packet Switching, ACM Trans, on Computer 

for the asynchronous nature of the cell streams. Systems, vol. 9, pp. 101-124, May 1991. In these 
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schedulers, each stream is assigned a weight corresponding 
to the QoS requested by a user of the stream. Accordingly, 
over an interval of time in which the number of active 
streams is fixed, the bandwidth received by an active stream 
should be roughly proportional to the assigned weight. 

By an appropriate assignment of weights, each stream can 
be provided with a share of the link bandwidth that is 
proportional to its weight. Hence, each stream receives a 
minimum bandwidth guarantee. If a stream cannot make use 
of all of its guaranteed bandwidth, the excess bandwidth is 
shared among the active streams in proportion to the 
weights. However, a stream with a larger weight will not 
only receive a higher bandwidth guarantee, but also receive 
larger shares of the available bandwidth than streams with 
smaller weights. Thus, the weight assigned to a connection 
determines not only its minimum bandwidth guarantee, but 
also its share of the available unused bandwidth. 

In this specification the term "weighted fair share sched- 
uler" is used generally to refer to a general class of work- 
conserving schedulers which schedule cells so as to give 
each stream a share of the link bandwidth which is approxi- 
mately proportional to a pre -assigned weight. A work- 
conserving scheduler transmits a cell over the link whenever 
there is at least one cell in queue. Thus, a work-conserving 
scheduler basically determines the order in which queued 
cells should be serviced. The operation of such a scheduler 
is described in the following. 

Consider an idealized fluid model for each traffic stream. 
Let w, be the weight assigned to stream i. At time t, the 
Generalized Processor Sharing (GPS) discipline serves 
stream i at rate: 



10 



20 



25 



30 



(2) 



35 



where A(t) is the set of backlogged streams at time t. Thus, 
each stream always receives a share of the available band- 
width which is proportional to its weight. Because of the 
discrete nature of cells or packets, a real scheduler can only 
approximate GPS scheduling. PGPS (Packet-by packet Gen- 
eralized Processor Sharing), also known as Weighted Fair 
Queuing (WFQ) noted above, and its variants (cf. S. J. 
Golestani, A Self-Clocked Fair Queuing Scheme for Broad- 
band Applications, in IEEE INFOCOM '94, Toronto, June 
1994; and J. C. R. Bennett and H. Zhang, WF2Q: Worst- 
Case Fair Weighted Fair Queuing, in IEEE INFOCOM '96, 
San Francisco, pp. 120-128, March 1996) are schedulers 
which approximate GPS for packet scheduling. Other 
examples of scheduling schemes which attempt to achieve 
fair sharing are the Virtual Clock and Weighted Round- 
Robin noted above. Several other weighted fair share sched- 
ulers have been proposed in the literature. 

It should be appreciated that the assigned weight and the 
current usage of the network would determine whether a 
QoS requested by an incoming call can be guaranteed. 
Therefore, various Connection Admission Control (CAC) 
algorithms have been developed which decline service when 
the QoS cannot be guaranteed. For that matter, the CAC 
algorithm must be able to predict the load on the system, 
including the newly received call if admitted. Therefore, 
delay bounds have been found for WRR, WFQ, Virtual 
Clock and other fair share packet scheduling algorithms. 
Using these delay bounds, admission control schemes can be 
devised to provide worst-case delay guarantees. The delay 
bounds are typically obtained by assuming worst-case 
behavior for streams controlled by leaky bucket-type open- 



40 



45 



50 



55 



60 



65 



loop flow control mechanisms. However, a problem in such 
an algorithm is that the calculated bounds tend to be rather 
loose, since worst-case deterministic assumptions are made 
in obtaining the bounds. 

Another problem with the prior art schedulers is as 
follows. Conventionally, schedulers have been designed so 
that they are work-conserving with respect to the stream 
queues, in the sense that whenever link bandwidth is avail- 
able and a packet is in the queue, a packet will be transmitted 
over the link. In other words, if a packet is available for 
transmission and there is sufficient bandwidth, the packet 
will be transmitted and the scheduler will not idle. The 
work-conserving approach has been promoted in the prior 
art since it presumably results in the highest possible utili- 
zation over the link. 

However, within a switching system or the network, there 
may be several bottlenecks. For example, some of the 
streams may be bottlenecked at a downstream link at another 
stage within the switch or the network. In this case, provid- 
ing these streams more bandwidth than their minimum 
guaranteed rates (when bandwidth is available) could exac- 
erbate the congestion at the downstream bottleneck. Such 
congestion cannot be alleviated by the prior art schedulers 
because they are work-conserving with respect to a single 
bottleneck, servicing cells only in accordance with the 
available bandwidth at this bottleneck. That is, conventional 
weighted fair share schedulers always ensure that excess 
bandwidth is utilized and that the share of excess bandwidth 
made available to each queue is proportional to its weight, 
but they do no exercise control on the absolute value of the 
rate received at a bottleneck point. 

Additionally, if there is a downstream bottleneck, typi- 
cally backpressure signals which throttle upstream traffic are 
used to alleviate congestion. However, backpressure signals 
are susceptible to on/off oscillations, resulting in higher celt 
delay variation (CDV) and, more significantly, loss of 
throughput due to the equalization of bandwidth distribu- 
tion. That is, in the prior art when a buffer reaches its limit, 
a backpressure signal is sent to the source. Upon receiving 
the signal the source would stop transmission until the buffer 
signals that the pressure was relieved. However, at that time 
it is likely that all the sources would start transmission again 
concurrently, thereby overloading the buffer again so that the 
backpressure signal is again generated. Therefore, the sys- 
tem may oscillate for sometime causing large variation in 
cell delay. Additionally, since all the sources would stop and 
start transmission at the same time, the throughput would be 
equalized irrespective of the QoS requested by each source. 

Since weighted fair share schedulers schedule cells only 
with respect to a single bottleneck, throughput for a cell 
stream may suffer because of backpressure resulting from 
downstream congestion. Hence, it may not be possible to 
guarantee a minimum throughput in this case. Consequently, 
while the prior art weighted share scheduler is work- 
conserving with respect to a bottleneck link, it may be 
non-work-conserving with respect to a downstream bottle- 
neck. Thus, the present inventors have determined that 
work-conservation is not always a desirable property and 
may lead to further congestion downstream. 

Yet another problem with the prior art weighted fair 
scheduling is that they necessitate an algorithm for searching 
and sorting out the timestamps applied to the cells in order 
to determine the next queue to service. More specifically, in 
the prior art the time stamps are relative, i.e., the scheduler 
needs to continuously order the cells according to their 
timestamp. For example, the scheduler may order the cells 
according to the length of the timestamp or according to the 
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time remaining before the cell would be discarded. Such 

calculations may slow down the scheduler. W: (10) 

As is known in the art, the ATM Forum has established . ^ *y 

... . jsMn 
four main classes of traffic, generally divided into real time 

traffic and non-real time traffic. Constant Bit Rate (CBR) and 5 

Variable Bit Rate (VBR) are used for real time traffic, e.g., where, in general, w^w,. In Eq. (10), the minimum rate 
audio and video. Available Bit Rate (ABR) and Unspecified guarantee and the excess bandwidth for a stream are 
Bit Rate (UBR) are non-real time traffic, and are mainly used decoupled. This decoupling allows the network provider to 
for computer communication. As can be appreciated, ABR in debute the unused bandwidth independently of the mini- 
traffic has no minimum rate requirement and the main goal mum guaranteed rates. The inventive DRC scheduler natu- 
in scheduling ABR cells is to "pump" as many cells as ^couples the minimum rate guarantee from the excess 
possible using the available bit rate. bandwidth allocated to a stream, weights can be assigned on 

a per-class basis by the CAC or dynamically via a closed- 
a dual proportional -derivative (PD) controller for ABR loop QoS control mechanism, 
service has been proposed in A Kolarov and G. 15 Thus, for example, for UBR it may be preferable to assign 
Ramamurthy, Design of a Closed Loop Feed Back very small or even zero guaranteed minimum rate, but to 
Control for ABR Service, in Proc. IEEE INFOCOM provide a large portion of the available bandwidth. This will 
'97, Kobe Japan, April 1997. The scheduler is imple- help satisfy many real-time calls, while providing service for 
mented on a network-wide basis using resource man- non-real time UBR when there is bandwidth available, 
agement (RM) cells. Generally, the source generates 20 noted above k that since work-conserving schedul- 
RMCEs which propagate through the network. As each ers transmit a cell whenever there is at least one cell in a 
RMC cell passes through a switch, it is updated to 1 ueue > il onl y determines the order in which queued cells 
indicate the supportable rate, i.e., the rate the source should se™ed. By contrast, a non-work-conserving 
should transmit the data (generally called explicit rate). „ scheduler may allow a cell time on the link to go idle even 
nun n c j l 1 * a. ™ *u * a. it there are cells in queue. He nee, in addition to the ordering 
These RMC cells are fed back to the source so that the „ „ . . n . . . / . . , & 
j. . m 4 • * j- i of cells for service, timing is also important in non-work- 
source may adiust its transmission rate accordingly. L j i rn_ r »i_ * • * L 

, J . , , t^w^ . * conserving schedulers. Therefore, the present inventors have 

However, the propagation of the RMCs through the devel d mecQanisms to acctmnt for both orderin and 

network causes a large delay in controlling the source. ^ of cc]1 traQsmissioQ . However, unlike the prior art, in 

While such delay is acceptable for scheduling ABR 30 the present invention the umestamp are absolute, rather than 

cells, it is unacceptable for scheduling real time trafic. relative ^ ^ at given current time, CT, any cell 

Moreover, the delay need to be accounted for by the having a timestamp which equals the current time is eligible 

scheduler, which complicates the computations and for service. Therefore, there is no need for constant ordering 

slows down the scheduler. 0 f the cells according to timestamps. 



SUMMARY OF THE INVENTION 



BRIEF DESCRIPTION OF THE DRAWINGS 



FIG. 1 is a schematic illustrating a static rate control 

The present invention provides a new scheduling scheme scheduler according to the prior art. 
which uses statistical approaches to admission control so as 4Q FIG. 2 is a schematic illustrating the dynamic rate control 

to provide much higher utilizations, while maintaining the according to the present invention. 

guaranteed QoS. The general concept of the present inven- piG. 3 is a block diagram depicting an embodiment of an 

tion is do construct the rate from two components: (1) a inventive controller based on matching target utilization, 
minimum guaranteed rate, and (2) a portion of the unused FIG 4 is a block di agram depicting an embodiment of an 

bandwidth. Constructing the rate from two components 45 inventive controller based on matching target queue length, 
allows the scheduler to operate under at least three modes: FIG. 5 is a schematic illustrating the inventive DRC 

(1) full available rate (i.e., minimum guaranteed rate plus a scheduling with overload control. 

portion of the unused bandwidth), (2) minimum guaranteed plG 6 ^ a illustrating the inventive DRC 

rate, and (3) halt transmission (with very small probability). scheduling with multiple bottlenecks. 
In its preferred form, inventive scheduling scheme 50 mG ? fa a mustratin a t!lte<Amin ^ ^ed,^ 

decouples the ™um guaranteed rate from toe portion of structure for c]ass ^ 
unused bandwidth and is called Dynamic Rate Control r to 



(DC). 



FIG. 8 is a schematic illustrating a rate-shaping scheduler 
structure for per virtual channel queuing. 
The DRC first distributes the bandwidth so as to support 5s F IG. 9 is a schematic illustrating an input output buffered 
the guaranteed QoS, i.e., it supports the minimum guaran- switch, 
teed rate. Then, the DRC distributes any unused bandwidth 

to users, based upon a criteria which, in the preferred DETAILED DESCRIPTION OF THE 

embodiment, is independent of the minimum rate guaran- PREFERRED EMBODIMENTS 

teed to the users. A notable feature of the inventive DRC is 6o A ^ q{ ^ t immtim h ^ 

that it is not necessarily work conserving, but rather takes of a ra(e wMch mdudes , WQ ^ , minimum 

into account bottlenecks downstream m determining teed rate md a share of the excess bandwidth . ^ 

whether to allocate unused bandwidth. « . j . • * • 

allows the scheduler to provide service according to various 

As noted above, a disadvantage of the prior art weighted QoS requirements and to shape the rates while accounting 

fair sharing is that the entire bandwidth is allocated accord- 65 for downstream bottlenecks. 

ing to the assigned weight. However, it might be desirable Unlike GPS-type schedulers, which distribute the entire 

to determine the service rate according to: bandwidth according to assigned weights, the inventive 
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scheduler first services the minimum rate and then may or 
may not distribute the unused bandwidth. In its simpler 
version, this approach can be derived as follows, 

A fair share scheduler provides stream i with a minimum 
bandwidth guarantee, which is a function of the entire 
available bandwidth, i.e., 



£w, 



(3) 



10 



Clearly, 

N 

I 



£m, = c. 



(4) 



15 



However, it is preferable to separate the rate into two 
components: the share of the bandwidth as provided by the 
minimum rate, plus the share of the unused bandwidth. 
When A(t) is the set of active streams, the two components 20 
rate can be written as: 
where 



j-1 



£•» 

k 1 

1 N 

£ £ 



25 



* £ »j 

£ w> 



(5) 



16) 



35 



£ w j 

jeA 



(7) 
(8) 

(9.) 



45 



50 



55 



is the excess bandwidth available at time t. Referring to Eq. 
(6), we see that the rate at which stream i is served is the sum 
of the minimum guaranteed rate M ( - and a weighted fraction 
of the unused bandwidth E(t). Thus, depending on the load 
on downstream buffers, the scheduler may or may not use 
the second component, i.e., may or may not distribute the 
unused bandwidth. However, it may always continue to 
provide the guaranteed minimum rate. 

While the above scheduler is capable of guaranteeing 
minimum rate and shaping the rate using the excess 
bandwidth, it lacks flexibility insofar as the distribution of 
the excess bandwidth is closely correlated to the assigned 
weights. From Eq. (5), it is clear that the rate at which stream 
i is served at time t is the weight w,- multiplied by the sum 
of the link capacity normalized by the sum of the weights 60 
over all streams, and the unused bandwidth normalized by 
the sum of the weights over all active streams. Hence, both 
the minimum guaranteed rate M t - and the excess rate E/t) are 
proportional to w,-. This is not necessarily desirable from the 
network provider's point of view. The network provider may 65 
prefer to distribute the unused bandwidth in a different 
proportion than that of the minimum guaranteed rates M,-. 



8 



Therefore, unless otherwise noted, the remaining descrip- 
tion refers to the preferred embodiment of the present 
invention wherein a novel dynamic rate control (DRC) is 
provided which ensures the guaranteed QoS and distributes 
unused bandwidth in an efficient manner decoupled from the 
minimum guaranteed rate. The inventive DRC is not nec- 
essarily work conserving, but rather takes into account 
bottlenecks downstream in determining whether to allocate 
unused bandwidth. 

In the case of a single bottleneck link shared by a set of 
traffic streams, the inventive DRC provides a minimum rate 
guarantee for each stream. Streams which do not make full 
use of their minimum rate guarantees (i.e., streams with 
input rates less than their minimum rate guarantees) con- 
tribute to a pool of excess bandwidth which is made avail- 
able to streams which transmit in excess of their minimum 
rates. In DRC scheduling, the distribution of the excess 
bandwidth is determined by weights assigned to the streams. 
In contrast with weighted fair share schedulers, the share of 
the excess bandwidth which is made available to a stream in 
the inventive DRC is decoupled from the minimum rate 
guarantees; i.e., the share of the unused bandwidth need not 
be proportional to the assigned minimum rate guarantees. 

The DRC scheme also strives to provide the minimum 
rate guarantee on a short time-scale. That is, the DRC 
scheduler paces the cells of each stream queue such that the 
spacing between cells belonging to the same stream is no 
smaller than the reciprocal of the minimum rate. If the 
connection admission control determines a certain minimum 
bandwidth requirement for a stream to meet a given QoS, the 
DRC scheduler should be able to deliver the required QoS 
by virtue Of its ability to guarantee this minimum rate. 
Moreover, the DRC scheduler distributes unused bandwidth 
in a fair manner among the competing traffic streams. 

When there are multiple bottlenecks in a switch, the DRC 
scheme can eliminate congestion at all potential bottlenecks 
for a given traffic stream. In contrast to the prior art weighted 
share schedulers, the inventive DRC can provide minimum 
rate guarantees even in the presence of multiple bottlenecks 
along a path within the switch. When there are multiple 
bottlenecks, the share of unused bandwidth given to a stream 
at a given bottleneck may also depend on the state of the 
downstream bottlenecks. In this case, rate feedback from 
each bottleneck encountered by a stream is used to choose 
the maximum rate at which a virtual channel (VC) can send 
without causing congestion. Furthermore, DRC can be 
extended beyond the switch in a hop -by-hop flow control 
scheme which can provide end-to-end QoS guarantees. (As 
is known in the art, the term virtual channel refers to a link 
of communication which is established and maintained for 
the duration of each cell. The link is called virtual channel 
since, unlike synchronous transmission, there is no set 
channel designated to a particular caller.) 

In the inventive DRC scheme, the excess bandwidth is 
shared amongst competing users via the computation of 
dynamic rates in a closed-loop feedback loop. DRC sched- 
uling also requires the internal transmission of control 
information within the switch. Notably, the DRC scheme 
lends itself to a relatively simple rate-shaping scheduler 
implementation. Unlike fair share schedulers based on 
timestamps, no searching or sorting is required to find the 
smallest timestamp. 

The main features of the DRC scheduler are outlined 
below: 

1. Provides minimum rate guarantees for each stream. 

2. Allows flexible distribution of excess bandwidth. The 
share of excess bandwidth can be determined by; 
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(a) Static weights according to traffic class (or other scheduler is assumed. The stream queues are simultaneously 
criteria) set by the connection admission control served at the dynamic rates R,. as fluid flows. The input 
(CAC). Each class weight may be multiplied by the streams to the stream queues consist of discrete cells. Each 
number of active virtual channels (VCs) belonging to cell brings a batch of work to the stream queue at which it 
the given class to achieve fairness with respect to S arrives. The actual implementation of the DRC scheme 
VCs. according to the preferred embodiment is an approximation 

(b) Dynamic weights determined according to observed to the idealized model, 
quality-of-service (QoS) by a dynamic closed-loop 

control mechanism. Continuous — time Model 

3. Provides internal switch congestion control. This is 10 Ut A( t ) be the set of active streams at time t. A stream is 
advantageous especially for providing minimum rate considered active, if its corresponding stream queue is 
guarantees without overflow of buffers. backlogged. The most general form of the dynamic rate 

4. Allows extensibility to provide minimum rate guaran- associated with stream i is given by 
tees on an end-to-end basis via hop-by-hop flow con- 
trol. 15 RtfWrtMEW. (") 

Dynamic Rate Control Principle where Mi is a minimum guaranteed rate, E(t) is the excess 

rate available to all streams at the common bottleneck, and 

This section describes the principles behind the inventive ^ e [ 0)1 ] is a normalized weighing factor. That is, the 

dynamic rate control scheduling. Consider a set of N ATM 2Q dynamic rate comprises two components: the guaranteed 

cell streams multiplexed onto a link of capacity C. Each minimum rate, M„ and a part of the unused bandwidth, E(t), 

stream may correspond to a single virtual connection (VC), determine according to the weighing factor, (t>,<t), assigned to 

or a group of VCs belonging to the same traffic class, i.e., a tnat stream, 
group of VCs requiring the same QoS. Associated with each 

stream is a queue which stores cells waiting for service, i.e., ^ Define 
cells waiting to be transmitted over the link. The function of 
a scheduler is to determine when the queued cells are to be 

serviced. J^H^C'), (12) 

As noted above, the inventive DRC ensures the guaran- as me variable component of the DRC rate for stream i. The 

teed QoS. In the preferred embodiment of the inventive 30 excess ra t e is defined by: 
DRC scheduling, the QoS guarantees are mapped onto 

minimum rate guarantees; however, it should be understood ^ Y m 

that other mappings may be used. That is, according to the " ~ J " 
preferred embodiment, the traffic characteristics (obtained 

from some combination of traffic descriptors and traffic 35 

measurement) and QoS requirements for stream i are wherein C is the rate of the common queue and M,- is the 

mapped onto a rate, M ( -, which is to be provided by the DRC actual transmission rate of stream j of the set A(t) of active 

scheduler. If the mapping is done correctly, guaranteeing the streams at time t. The weights $ t {t), ieA(t) reflect how the 

rate M, is then tantamount to providing the QoS guarantee. excess bandwidth is to be allocated among the streams and 

It is therefore imperative for the scheduler according to the 4Q are normalized such that 
preferred embodiment to be able to guarantee the minimum 

rate M, for each stream. ^ ^ w _ L (14) 

For the purposes of this discussion, it is useful to think of i«m<o 
stream i as a group of VCs belonging to the same traffic 

class, with the same QoS requirements. An embodiment of 45 

a per VC queuing will be discussed in a later section. For a ^ weights 4> ( <0 are normalized versions of positive 

given traffic stream i, the bandwidth M i( required to meet weights w^t), ieA(t), i.e., 
cell loss and delay requirements can be computed based on 
the traffic parameters of the individual VCs and the buffer 

size. The multiclass connection admission control (CAC) 50 ^0) 
scheme developed in G. Ramamurthy and Q. Ren, Multi- 
Class Connection Admission Control Policy for High Speed 
ATM Switches, in Proc. IEEE INFOCOM '97, Kobe, Japan, 

April 1997, provides procedures for computing M, for CBR, Eqs. (11) and (13) define an idealized DRC scheduling 

VBR, and ABR traffic classes based on the traffic parameters 55 scheme for the fluid model. The basic concept of the DRC 

declared by individual VCs. The CAC described in that scheme is illustrated in FIG. 2. 

paper takes into account statistical multiplexing gain when As shown in FIG. 2, each stream i has a queue, Ql-QN, 

there are many VCs belonging to a stream and can further be each of which being served at a dynamically variable rate 

made more aggressive in its allocations by incorporating Rl-RN. Each dynamic rate Ri is made of the guaranteed 

traffic measurements at the switch. Given the rate Mi for 6Q minimum rate Mi and a share of the unused bandwidth Ei, 

each stream, the most important requirement of a scheduler termed the DRC rate. The flow from all the queues is fed to 

is to ensure that each stream receives service at rate M f . For the common queue CQ, which is served at a rate C. 

stability of the system, clearly we must have the equation 1 l n practice, it is very difficult to track the set function A(t), 

noted above hold true, i.e., the sum of all the individual rates since it can change with high frequency. Hence, it is imprac- 

must be less or equal to the rate of the common queue. 65 tical to compute the unused bandwidth via Eq. (13). If the set 

In developing the theory behind the inventive Dynamic of traffic streams is reasonably large and the contribution 

Rate Control (DRC) scheme, an idealized fluid model for the from an individual stream is small relative to the aggregate 
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stream, the statistical multiplexing gain will be high. Under The control system is stable for a^O converges with 

these conditions, the rate at which E(t) changes by a sig- exponential decay rate a 0 . In terms of the input streams, the 

nificant amount should be much slower than the rate at aggregate flow rate can be expressed as: 
which A(t) changes. This suggests that it is more feasible to 

track E(t) directly, than to track A(t) and then compute E(t) 5 F w = £ R . (20) 

via Eq. (13). In DRC, E(t) is estimated using a feedback ie^o 
control loop to be discussed more fully below. 

Discrete — time Flow Model /ey , (I) 

It is instructive to consider a discrete-time model, where io 

time is partitioned into intervals of length A. Assume that T derivatives on both sides (wherever F(t) is 

stream i arrives to rts associated stream queue as a constant we have 
fluid flow of rate X,(n) in the time interval Tn=(nA, (n+l)A). 

In this time interval, stream queue i is served at the constant £(t)-£(t). (22) 

rate R^n^Mj+w^n), where w i is a fixed weight assigned 15 

to stream i. The output flow rate, F/n), from queue i during Hence, the control law (10) can be re-written as: 
the interval T„ is then given by 

F^^min^X")^")) (I 6 ) 

That is, if the queue is backlogged, the rate would be R,{n); 20 Th e control law Eq. (23) is the basis for a method for 

otherwise it would be the arrival rate X<(n). The aggregate estimating E(t) in the discrete-time model. The discrete-time 

flow rate to the bottleneck queue during T n is then form of E 1* i 2 *)* is 88 follows: 

n (n) £(«+l)=£(n)-aot(«), (24) 

^ ~ § r,tn) ' 25 where we define the error signal as e(n)-F(n)-U 0 C. Since 

the excess bandwidth must lie in the interval [0,C], the 
control law takes the form: 

The excess bandwidth, E(n), over the interval T„ is the sum 

of a static, unallocated portion of the bandwidth, and a E(n+i)~i [0 , c {E{n)-a < £(n)), (25) 

dynamic part of the bandwidth that is currently not used by 30 

streams which transmit at less than their minimum guaran- where I (0 ,c]to-1 if to equals or larger than zero, but equal 

teed rates: or ^ ess Q otherwise, I [0tC fa)=0- Over an interval in 

which the input fluid stream flows are constant, the recursion 

( N . N fig) in Eq. (25) will converge to the correct value for the excess 

c-£ + jT[M ( -x,(rt)]\ 35 bandwidth E. The speed of convergence depends on the 

f=iJ »=i values of the coefficient and the sampling interval A. 

FIG. 3 shows a block diagram of the controller based on 

i_ *+• a / n\ a • • •„ ,r«:„.,n * u*„-„ matching the target utilization U 0 . The error is calculated by 

where x 4max(x,0). Again, since it is difficult to obtain ° . 6 0 t „ ™ m, .11 

1 1 a f <u • * a™ —t~r v /„\ ,u fl rt „ POT1( • 7 _ adder 10 and is provided to the controller 20. The controller 

knowledge ot the input flow rates X.<n), the present inven- _ A * t ... . ,. , . , . 

j 1 j • j- c * „J „ c/-„\ o 40 20 outputs the current excess bandwidth E(n) which is fed 

tors developed an indirect means of computing E(n) via a 40 / . \' . A . 

control loo scheduler 30. In practice, there is a delay 

con ro p. . in the feedback loop between the controller and the sched- 

Closed — loop Control uler. However, within a switch, this delay x is typically 

The excess bandwidth E can be estimated via a feedback negligible relative to the sampling interval A and can be 

control loop. By adjusting the value of E, the aggregate flow 45 ignored. The DRC scheduler ^locates the excess bandwidth 

rate to the bottleneck queue, denoted by F, can be controlled E( n ) t0 & c m P ut streams (Xj(n)-X„(n) according to the 

such that' D ^ scneduler disclosed above, which results in 

1. The 'length of the bottleneck queue is close to a target a-aggregate flow rate F(n). 

value Q 0 ; or Matching Queue Length 

2. The average utilization at the bottleneck queue is close 

to a target utilization value U 0 <1 . Let Q(t) be the length of the bottleneck queue at time t and 

Two control algorithms are disclosed herein for estimat- let Q 0 be a target queue length value. Assuming a fluid 

ing E based, respectively, on matching a target queue length model of traffic, the queue length grows according to the 

and a target utilization. Also disclosed herein is a hybrid aggregate rate less the common queue rate: 



6©-f(D-c. (26) 



control algorithm which combines the merits of the first two. 55 
Matching a Target Utilization 

Consider the continuous-time model of the scheduler. Let n * proportional control law, 

F(t) denote the aggregate flow rate into the bottleneck queue A > , , . (rh 

at time t. We wish to control F(t) to achieve a target ^ w " a <*Wru<* K > 

utilization U o e(0,l). leads t0 the equation 
The following proportional control law can be used: 

/(KWl-iW 0") OVA®-** CaO 

That is, the rate of change of the aggregate flow rate is 65 (See, e.g., L. Benmohamed and S. Meerkov, Feedback 

proportioned to the aggregate flow rate less than product of Control of Congestion in Packet Switching Networks: The 

the target utilization and the rate of the common queue. Case of a Single Congested Node, IEEE/ACM Trans, on 
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Networking, vol. 1, pp. 693-708, December 1993.) The the target utilization. When the utilization exceeds U 0 , the 

characteristic equation for (28) has a double root, implying hybrid controller switches to the queue-based controller, 

non-decaying, oscillatory behavior of Q(t). This problem This results in a more controlled queue growth as the 

can be resolved by adding a derivative term to (27), resulting aggregate traffic flow increases. 

in the proportional-derivative (PD) controller 5 As can be seen, while the underlying principle of the 

hybrid controller is somewhat similar to the dual 

^(O^'oCGCd-GaJ-a'ifiW ( 29 ) proportional-derivative (PD) controller proposed in the 

^ . ... Kolarov and Ramamurthy paper cited above, the present 

The corresponding differential equation governing the mvention enaWes w Qf ^ pD controller for re £ ^ 

behavior of Q(t) is: 30 traffic ^ i& made possiblCj among ot h er s f by implement- 

a/a^> ^/A_«. o nn\ m S me PD controller on a switch, rather than a network- 

Q( t)+ a.M+aoQ(Mo, (30) ^ ^ Consequcnt]v> ^ delay of ^ RMCs becomes 

which is stable if a' 0 , a'^0. The convergence rate can be negligible so that RMCs can be used to control real time 
assigned arbitrarily by appropriate choices for a' 0 and a\. **& c - Additionally, the practical elimination of the delay 
From (29) and (21), the unused bandwidth can be obtained ^ allows simplification of the calculation and speed-up of the 

controller. 



as: 



E(t)~a> Q W)-Q 0 )-*\C>{t), (31) Overload Control 

The hybrid closed-loop controller adjusts the unused 

Then a discrete-time controller can be obtained from (31) as 20 bandwidth E(n) t0 teduce ^ magnitude 0 f aD error signal 

^D-W^-a^-a-,^)) 02) r^-t/oC if ™<IfcC (33, 

where e(n)=Q(n)-Q 0 is the error signal sampled at time n. [QW-Vo otherwise 
Here, Q(n) is the queue length sampled at time t-nA. This 

controller attempts to keep the queue length near Q 0 , main- 25 TT • r *i_ * * ». 

taining the utilisation at 100% H°, wev ,? r ' *f ° f ^ a 8S re 8 a e !f ea ? ma y be 

FIG 4 shows a block diagram of the controller. The target f aster u ' hat of tb f, closed-loop controller. The queue 

i * j c *u 1 *u * «• length, Q(n), may still grow to a large value before the 

queue length Q„ is subtracted from the queue length at time i.\ * „ . . • , , * 

rt/ \ if *u jj +a * -j *u if/ \ . *u closed-loop controller can bring it close to the target value 

n, Q(n), by the adder 14, to provide the error E(n) to the rt JL ■ j , « . « 

* 11 ii4 * n ^ * * *i_ j u j 30 Q n . The queue growth is caused by streams which transmit 

controller 24. The controller 24 outputs the unused band- ° 4 , rC . . . 4 , r™ . 

• i i r 1 / \ * i_ r ji_ i * *u T^f>^ . ii a * at rates larger than their minimum guaranteed rates. This 

width E(n) to be feedback to the DRC controller. Again a - & . . „ _ . . 6 j 

u*4j^ju* u- ju *u may result m buffer overflow in the second stage queue 

delay x may be introduced, but may be ignored when the j. , , , , . , . . , 

rtnA • . 4l _. . t>. nT1 r 4 _ and/or unacceptable delays for streams which are transmit- 

DRC is uses within the switch. The DRC controller then . ' . ...... ^ , 

ii . U1 . A t . t ting at close to their minimum rates. Dependmg on the value 

allocates the available bandwidth to generate an aggregate 7*? . . t 6 t1 

flow rate Ffn) 35 of A, the response time of the closed-loop controller may be 

^ '* too slow to prevent buffer overload in the second stage 

Hybrid Control . t e J x t . 

Therefore, in the preferred embodiment an overload con- 
Clearly the disadvantage of matching the flow rate to a trol mechanism is employed which reacts quickly to control 
target utilization, U 0 , is that bandwidth is lost since U 0 must 40 congestion. DRC scheduling with overload control is illus- 
be less than one. However, if queue length information is not trated in FIG. 5. When the bottleneck queue length exceeds 
readily available, the control algorithm based on flow rate the shape threshold Q^Qq, a shape signal is transmitted to 
measurement is a viable alternative. The control algorithm the DRC scheduler. The DRC scheduler responds by shaping 
based on queue length information achieves 100% utiliza- all streams to their minimum rates. This is equivalent to 
tion. A disadvantage of this algorithm, however, is that when 45 driving the unused bandwidth signal, E(n), to zero. The time 
the utilization is less than 100% the system is not controlled. for the shape signal to propagate to the DRC scheduler and 
If the utilization is less than 100%, the queue length is zero; take effect should be much smaller than the DRC sampling 
hence, E(n) reaches the maximum value C. Now if the interval A. When the queue level falls below Q 1( the DRC 
aggregate traffic flow increases to a rate close to C, the queue controller reverts back to the hybrid controller, 
may grow to a large value before the controller can bring the 50 To prevent buffer overflow in the second stage queue, a 
queue length to the target value Q 0 . stop signal can be used. When the second stage queue length 
If both flow rate and queue length information are exceeds a stop threshold Q 2 >Q 1 , a stop signal is transmitted 
available, the merits of both controller algorithms may be to the DRC scheduler. In this case, the DRC scheduler 
combined in a hybrid controller as follows: completely throttles all traffic streams until the queue length 

55 falls below Q 2 . This control mechanism imposes a maxi- 

if Ffn)<U C then mum ^ en 6 ln f° r ^ secon d stage queue. In practice, the stop 

0 signal should be activated with low probability. This will be 

£(rt+i)=£(rt)-ao(F(n)-u 0 c) the case if the thresholds Q x and Q 2 are chosen correctly. 

With the shape and stop overload control signals, the sched- 

else 60 uling rate for each stream at the nth sampling interval can be 

expressed as: 

£^i+l)^(B)-o'o020i)-ea)-a , i02(»-lH}cD D , Uu , fn , , „ n , » ™ 

R^n)^M^I { ^ Q1) (g(nyw t E{n)}J {Q<Qli ((Q{n)l (25) 

end if where Ix(x) denotes the indicator function on the set X. 

We remark that the coefficients a 0 ,a„ will be different for 65 Wherein I{ C<£? i}(Q(n))-l if Q(n)<Ql, and I{ O<C a}(Q( n ))-0 

each of the two controllers. Thus, when the utilization is less otherwise. Similarly, l/ G<G2 }(Q( n )) sal ^ Q(n)<Q2, and 

than U 0 <1, the utilization-based controller attempts to match I{ G < O 2}(Q( n ))=0 otherwise. 
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Multiple Bottlenecks rescheduled. The timestamp computations ensure that each 

_ - iL . . stream is shaped to the appropriate rate, as determined by the 

One notable application for the inventive DRC is in a scheme 
multi stage ATM switch, such as that exemplified in the 

related U.S. application Ser. No. 08/923,978 now U.S. Pat. $ Scheduling 
No. 6,324,165. In a multi-stage ATM switch, there are 

multiple bottleneck points. A stream may pass through A given queue is scheduled when the queue is empty and 

several stages before reaching an output line of the switch. a new cell arrives to the queue, i.e., when the associated 

In the DRC approach, the streams are rate-controlled at the stream changes from the inactive to the active state. The 

input stage to control congestion within the switch itself, as 1Q basic formula for computing the new timestamp for sched- 

opposed to controlling flow in the network. Therefore, the uling queue i is given as follows: 
bulk of the cell buffering occurs at the input stage of the 

switch. TS r ttax{Cr,TS f *VR&)}, (39) 

Consider an individual stream which passes through B where CT ^ te current ^ CT €(nA / n+1)A] R (n) is 

bottlenecks. At the jth bottleneck, a DRC rate E«(n) is 15 (he dynamic fate for qucue { at ^ n ] 
computed at the nth sampling interval. Define the overall 

bottleneck excess rate as: Rescheduling 

£*(«)-miii EW(n). (35) A given queue is rescheduled after an active stream has 

swn i m * - i . on been served and the stream remains active, i.e., its associated 

Let Q, VJ and Q, UJ denote, respectively, the shape and stop 20 . _ , . « 

V 2 . . , , i , n * L queue remains non-empty. In this case, the timestamp corn- 
thresholds at the ith bottleneck. Define the vectors: _ . P ^u^.rl™ ■ v 

J putatioo tor rescheduling queue i is: 



Denote the queue length at the jth botdeneck at time n by 
Q w (n) and define the vector: 

Q(nHQV>(n).l^m> (37) 



TSfTSf+VRiiri). (40) 

25 Catching — up with current time 

Queue i is said to be ready at current time CT if TS^CT. 
This means that the queue can be served while conforming 



Then, in analogy to Eq. (34), the dynamic rate for stream i to its assigned dynamic rate R,-. In a practical implementa- 
tor the multiple botdeneck case is computed as: 30 uon of DRC scheduling, it is possible that the sum of the 

dynamic streams may exceed the link capacity, i.e., 

>C. (41) 



35 



/tX^-tAf^^o^feW^^^KiG.ojK^W)* (38) 

FIG. 6 shows a set of stream queues, Ql-QN, along with 
a set of bottleneck queues. At the ith bottleneck queue, a 
DRC rate, E,, is estimated based on flow and queue length 

information. For a given stream, e.g. ST3, the overall DRC if this condition persists, the set of ready queues may 

rate is the minimum of the bottleneck rates for bottlenecks increase and the timestamps of some ready queues may fall 

traversed by the stream. From the perspective of the given behind the current time, CT, by large values, 
stream queue, congestion in downstream bottlenecks are To correct this situation, a mechanism is provided 
controlled and the queuing of cells is pushed upstream when 40 whereby a queue is scheduled at its minimum guaranteed 

congestion arises at one or more bottlenecks. Ultimately, the ra te if its associated timestamp falls behind current time by 

congestion is pushed back to the stream queue, where most a designated amount. In particular, if the queue timestamp 

of the queuing takes place for the stream. falls behind current time by more than the reciprocal of the 

. minimum guaranteed rate at the time of scheduling/ 
Rate— shaping Scheduler 45 rescheduling) ^ queue ^ scheduled/rescheduled at the 

In order to implement DRC, a mechanism for rate- minimum guaranteed rate. Scheduling a queue at its mini- 
shaping a number of streams is necessary. Two implemen- mum guaranteed rate allows its timestamp to ' catch up* with 
tations of a scheduler which shapes a set of streams accord- the current time clock, by slowing down the rate at which the 
ing to DRC rates are disclosed herein. The first is appropriate 5Q queue is being scheduled. With the * catch up' provision, the 
when the number of streams is relatively small (on the order scheduling and rescheduling procedures are as follows: 
of a hundred or less), for example, when cells are queued 

according to class. The second implementation can handle a Scheduling 
large number of streams (on the order of tens of thousands), <CT-1/M then 

but is slighdy more complex. 1 *' 



55 



TS r msx{CT, TSf+VMi} 



Scheduling for Rate — shaping 

DRC scheduling is implemented using timestamps. A else 
timestamp, TS, is associated with each queue. A stream is 

active if and only if its corresponding queue is non-empty. $0 TS r msx{cr, TS^i/R/n)} 

Otherwise, the stream is inactive. The DRC scheduler sched- 
ules only active streams for service. When a stream is end if 
served, the first cell in the associated queue is transmitted to 

the second stage queue and the status of the stream is Rescheduling 
updated. 65 

T\vo distinct timestamp computation formulas, are pro- 
vided depending on whether a queue is to be scheduled or rs^TS^yMi 



if TS^CT-1/M; then 
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{1 CT £ 75; and queue i active (42) 
0 CT<7S, 



0 if queue / scheduled at rate M, (43) 

1 if queue r is RT-S 
1 if queue f is RT-L 
3 if queue f is NRT 



en Queues with f # -=l have had their timestamps expire and 

hence are ready for service. These queues are served using 

Serving Ready Queues round-robin with priority based on a priority flag p, assigned 

as follows: 

Serving a ready queue consists of transmitting the first iq 
cell in the queue to the bottleneck queue and, if necessary, 
rescheduling the queue for service. It is possible that several 
queues may become ready at the same time. In practice, it is 
not possible to serve all ready queues during one cell time. 

Thus, a collection of ready queues can form. 15 

The ready queues can in turn be scheduled for service by * , . . . ^ . .«,..„ _ _ 

c , u-itr- i j A logical view of the scheduler is illustrated in FIG. 7. 

means of a work-conserving scheduler. For example, ready ^™ r , , r JL - L r 

. . , ° ,. , zhe scheduler performs a round -rob in search for a queue i 

queues conld simply be served in round-robm fashion^ ^ If dq such fo^d- 

Alternatively, a weighted fair share scheduler could be used 20 ^ b ^ Tchi for { satisf ^ 

to ensure fair short time-scale bandwidth share among the ^ ^ L ^ processes until either a 

ready queues. However, the improved bandwidth share does fc found> or all priority lcyels havc been XKdbs6a 

not warrant the considerable additional complexity required 

to implement weighted fair share scheduling. Per VC Queuing 

25 

The preferred embodiment implements a round-robin In per VC queuing, the number of queues is on the order 

with four priority levels, listed below in decreasing order of of tens of thousands. In this case, it is not economical to 

priority: implement the scheduler using a parallel array of compara- 

• • /tttyv tors - Instead, a scheduler based on a time wheel data 

1, Dynamic high priority (HP) ^ structure is preferable. As shown in FIG. 8, each bin in the 

2, Real-time, Short CD V (RT-S) time wheel points to a four linked lists (one for each priority 

3, Real-time, Long CDV (RT-L) level) of VC identifiers whose timestamps correspond to the 

4 Non-real-time (NRT) ^ m ^ abe *' During eacn ^ mt slot > me current time CT 

m . . . „ advances to point to the next bin. Hie VC identifiers linked 

The HP priority is a dynamic assignment. Ready queues to bms whkh haye been passed by CT afe ^ ready fof 

which have been scheduled at their minimum guaranteed 35 ^ ^ ^ & ^ ^ rf ^ ^ [& 

rates are automatically scheduled as HP. This ensures that all constructed. VCs are then served from the ready list in 

streams receive their minimum rate guarantees on a short round-robin fashion, according to priority level, 
time-scale. The remaining three priority levels are assigned 

statically, according to traffic class and tolerance for cell Providing Multi — class Quality-of-Service 

delay variation (CDV). Streams classified as RT-S are real- 40 fa Qrder ^ prQvide quality . of . service (QoS)> a connection 

time streams which have small CDV tolerances, while RT-L admiss i on CODtro i ( C AC) algorithm is necessary to deter- 

streams have larger CDV tolerances. Non-real-time (NRT) mine whether or not a new VC can be accepted while 

streams generally do not have requirements on CDV satisfying the QoS requirement of the new VC and the 

In general, low bit-rate real-time streams would be clas- 45 existing VCs. DRC scheduling simplifies the CAC function 

sified as RT-L, while high bit-rate realtime streams would be by providing a direct mapping between the bandwidth 

classified as RT-S. However, the CDV tolerance of a stream required to provide QoS to a stream and the rate at which the 

need not be directly related to its bit-rate. The static priority stream is scheduled for service. In particular, the CAC 

levels protect streams with small CDV tolerance from the determines a minimum required bandwidth to provide QoS 

bunching effects of streams with larger CDV tolerances. For 50 for a traffic stream and this bandwidth is used as the 

example, consider a scenario in which there are one thou- minimum guaranteed rate for the stream under DRC. If the 

sand 75 kbps voice streams sharing a 150 Mbps link with a DRC scheduler is able to provide the minimum guaranteed 

single 75 Mbps multimedia stream. Assuming that the rate, then the QoS for the stream will also be satisfied. In 

multimedia stream is constant bit rate (CBR), it needs to DRC, the method of computing the DRC rate depends on 

send a cell once every two cell times. If cells from the voice 55 whether cells are queued by class or by VC. 

streams are bunched together (at or near the same timeslot), From an analysis point of view, per VC queuing is 

the multimedia stream will suffer from severe CDV, relative preferable; however, from an implementation point of view, 

to its inter-cell gap of one cell time. In the worst-case, two per VC may not be preferable since it requires a large 

cells from the multimedia stream could be separated by up number of queues. The main advantage of per VC queuing 

to one thousand voice cells, 60 is that each VC is guaranteed a minimum rate. Additionally, 

in a per VC queuing, all downstream bottlenecks are con- 

Per Class Queuing sidered in calculating the rate, thus avoiding overflow of 

downstream buffers. Also, in a per VC queuing there would 

In the case of per class queuing, when the number of be no head-of-line blocking. On the other hand, per class 

streams is relatively small, the scheduler can be imple- 65 queuing is simpler to implement if only one downstream 

mented with a parallel array of comparators. The ith com- bottleneck is considered. Notably, such an arrangement 

parator takes as inputs CT and TS, and evaluates requires fewer queues. 
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First, queues will be described for the case wherein a 
static weight, w,., is assigned to each class by the CAC. The 
value of the weight w,-, determines the share of the free 
bandwidth that is allocated to class i. Thereafter, a method 
wherein the weights are modified dynamically in the context 
of a closed-loop control, based on observed QoS Will be 
disclosed. 

Per Class Queuing 

When cells are queued by class, the traffic stream corre- 
sponding to class i is an aggregate of several individual VC 
streams. The CAC determines a minimum required 
bandwidth, denoted M,-, needed to satisfy the cell loss and 
cell delay requirements. The CAC provides the DRC sched- 
uler with the minimum guaranteed rate, M,, and also the 
class weight, w.. The dynamic rate for stream i at a single 
bottleneck point is computed as 
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time n. The common queue length is given by 

K 



(47) 



R t {n) = M; + 



2> 



(44) 



where w,- is the weight assigned to stream i and A(n) is the 
set of 'active' streams over the time interval ((n-l)A,nA]. To 
simplify notation, we shall implicitly assume that all com- 
puted rates are multiplied by the indicator function I (0 c] (x) 
to ensure that the rates fall in the range [0,C]. Methods for 
estimating the number (or weighted sum) of active streams 
or VCs, e.g., the sum 



S{n) 



/i 1 <«)-M,+£ c(0 («) ) 



The excess bandwidth at the bottleneck point, denoted 
E(n), is estimated as a function of the aggregate flow rate, 
F(n), and the common queue length, Q(n). It can be com- 
puted using the hybrid PD controller discussed earlier: 

if F(n)<U 0 C then 

£:(/i+l)=£(«)-a 0 (f(«)-C/ 0 O=a 1 (i : '(n-l)-t/ 0 C) 

15 

else 
20 end if 

The minimum guaranteed class bandwidth, is deter- 
mined by the CAC. The dynamic rate for class k is computed 
as: 



25 



(45) 



in Eq. (43) will be discussed further below. 

Suppose that we can obtain an estimate for n/n), the 
number of active VC streams composing stream i. By setting 
w^n^n), the unused bandwidth can be distributed fairly 
with respect to the individual VCs. In this case, each VC will 40 
receive an equal share of the excess bandwidth E(n). More 
generally, both a class weight and the number of active VCs 
can be taken into account in distributing the excess band- 
width. If -u> ( - denotes the static class weight, the DRC weights 
would be assigned as v? t (n)*ty-n t {n) 



Per VC Queuing 

In per VC queuing, for N VC there are N VC queues. Each 
VC queue belongs to one of K classes. Let c(i) denote the 
class to which VC i belongs and let C k denote the set of VCs 
belonging to class k. VC i is assigned a minimum guaranteed 
rate M t . Class k is assigned a minimum guaranteed band- 
width Mjt, which is sufficient to guarantee QoS for all VCs 
in class k. When VC i in class k becomes inactive, the 
unused bandwidth M,- is first made available to VCs belong- 
ing to class c(i), up to the class guaranteed bandwidth M*, 
and then to the VCs in the other classes. 

In the per VC paradigm a dynamic rate is computed for 
VC i as follows: 
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60 



(46) 



where E^n) denotes the estimated unused bandwidth for 
class k at the sampling instant n. We assume that the flow 
rate at time n, F^n), of class k cells into the common queue 65 
can be measured. Also, we assume that it is possible to count 
the number, Q*(n), of class k cells in the common queue at 



(48) 



where w* is the weight for class k and n*(n) is an estimate 
for the number of active VCs belonging to class k. 

The dynamic class rate R*(n) represents the bandwidth 
available to VCs in class k. For VCs in class k, the unused 
bandwidth is computed with respect to Rjt(n). The unused 
bandwidth for class k, E^n), can be computed using the 
hybrid PD controller as follows: 

35 if F^n)<U 0 w Rj(n) then 



else 

£ 4 (i«.l)^«)^'^n)-G 0 «-a\e A («-l)-0 0 <*^ 

end if 

Here, U 0 (t) and Q 0 CAr) are, respectively, the target utiliza- 
tion and target queue length for class k. 

Thus, the above novel per VC queuing accomplishes a 
two- tier distribution of the unused bandwidth. A first distri- 
bution according to class and a second distribution accord- 
ing to VCs within the class. 

In equation (46) it is implied that E c(0 (n) is distributed 
evenly among the active streams within the class. However, 
continuing with the DRC theme, one may elect to have 
variable distribution. This can be easily accomplished by 
introducing a weight factor, e.g., <j> ( ., assigned to each VC 
within the class. Thus, the rate would be computed as: 

Closed-loop Quality-of-Service Control 

In the DRC scheme, the excess bandwidth, E(n), is 
computed via a feedback control loop. This excess band- 
width is distributed among the streams by means of the 
weights, w,.. In the previous section, the weights were 
assumed to be static, and chosen by the CAC. However, the 
most general form of DRC allows the weights to be time- 
varying (cf. Eq. (2)). In the following, we develop a scheme 
for providing closed-loop QoS control in conjunction with 
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DRC, whereby the weights are adjusted based on the 
observed QoS of the streams. For the purposes of this 
discussion, we will assume per class queuing, although the 
same methodology applies under per VC queuing as well. 

Concept 

As discussed in the previous section, a CAC function is 
necessary to guarantee QoS. However, since the traffic 
streams are inadequately characterized, the CAC may over- 
estimate or underestimate the amount of bandwidth which 
should be allocated to a stream in order to guarantee QoS. 
The only way to determine whether QoS guarantees are 
being satisfied is to measure, in some way, the QoS expe- 
rienced by an individual stream. If the measured QoS falls 
below the target QoS, more bandwidth should be allocated 
to the stream. Conversely, if the QoS for a stream exceeds 
the target QoS, it may be possible to take some bandwidth 
away from the stream and make it available to other streams 
which are in greater need of the bandwidth, 

A dynamic or measurement-based CAC (cf. G. Rama- 
murthy cited above and S. Jamin, R Danzig, S. Shenker, and 
L. Zhang, A Measurement-Based Admission Control Algo- 
rithm for Integrated Service Packet Networks, IEEE/ACM 
Trans, on Networking, vol 9, pp. 56-70, February 1997) 
adjusts the bandwidth assigned to a stream based on traffic 
measurements. Measurements take place over a relatively 
long time interval (e.g., an order of magnitude less than call 
holding times). Then corrections are made to the CAC 
bandwidth allocation. For example, it may be possible to 
obtain reasonable estimate of cell loss probability over a 
time interval. A further challenge is how to determine how 
much bandwidth should be added to a connection suffering 
QoS degradation. The closed-loop QoS control method 
disclosed in this section operates on a shorter time-scale 
compared with the dynamic CAC, e.g., on the same order as 
the dynamic rate computations. The closed-loop control can 
make short-term corrections to alleviate errors made by the 
CAC. The closed-loop control can also work well in con- 
junction with a dynamic CAC. 

The basic idea is to assign the weight, w t -(t), proportional 
to the deviation between the observed QoS at time t and a 
target QoS measure for stream i. Let q^t) be the observed 
QoS for stream i in the first-stage queue i at time t. For 
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stream effectively makes a bid for the excess bandwidth 
based on the ratio of its perceived QoS to its target QoS. In 
this way, a stream suffering from poor QoS automatically 
takes more of the available excess bandwidth compared with 
a stream which is meeting or exceeding its target QoS. 
Closed-loop QoS can make short-term corrections to errors 
in bandwidth allocation made by the CAC. 

QoS Measurement 

In a real implementation, the QoS of a stream must be 
measured over a time interval. Updates to the weights take 
place at discrete times. A reasonable scheme is to 
re-compute the weights directly either before or after the 
computation of the excess bandwidth. As an example, the 
average cell delay, D(n), in the first stage queue for a stream 
can be measured over the time interval (nA, (n+l)A], Aver- 
age queue length is also a relatively simple weighing func- 
tion which can be used. 

On the other hand, it is very difficult to estimate cell loss 
probability over a relatively short period of time. This QoS 
metric can only be estimated over a longer time interval. A 
dynamic CAC might be able to estimate cell loss probability 
and to allocate sufficient bandwidth to correct the cell loss 
probability for the next measurement interval, based on 
observations of the traffic over the current interval. 

Congestion Control via Dynamic Rate Control 

In this section, we discuss how DRC can be used to 
control congestion at bottlenecks within a multi-stage 
switch. We then discuss how DRC can be extended beyond 
the local switch to provide hop-by-hop congestion control. 

Input-Output Buffered Switch 

The prior art, weighted fair share schedulers distribute 
bandwidth weighted with respect to a single bottleneck. 
Minimum rate guarantees can be provided with respect to 
this bottleneck. However, if there is a second, downstream 
bottleneck, the prior art weighted fair share scheduler may 
not be able to provide bandwidth guarantees. The input 
streams to the second-stage bottleneck may originate from 
different bottleneck points in the first stage. If these first 
stage bottlenecks are scheduled independently by weighted 



example, q t (i) could represent a measure of the cell delay 45 fair share schedulers, congestion at the common second- 



experienced at time t. Let q,* represent the target QoS for 
queue L The normalized deviation of the observed QoS from 
the target QoS is given by: since the 



stage bottleneck may arise, resulting in the loss of rate 
guarantee. 

FIG. 9 shows an example of a hypothetical NxN input- 
output buffered switch. Input and output modules, Iml-Imn 
(49) 50 and Oml-OMN, respectfully, are connected to a core 
switching element 101 having a central high speed bus 120 
(e.g., a time -division multiplexed bus). Each input module 
has a scheduler which schedules cells to be transmitted over 
the bus 120. Each output module consists of buffers 
55 RT1-RTN which operate at the speed of the bus, i.e., N 
times the line speed. When the output buffer occupancy 
reaches a certain threshold, a signal is broadcast to all input 
modules. The signal causes all input modules to throttle the 
flow of traffic to the given output module. This prevents 
With this assignment of weights, streams which are 60 buffer overflow at the output module, 
further away from their QoS targets receive greater shares of Consider two streams, s a and s 2 , originating from different 
the excess bandwidth. Conversely, streams which are meet- input modules and destined to the same output module, 
ing or exceeding their QoS targets will receive proportion- Assume that both streams are continuously backlogged and 
ately smaller shares of the excess bandwidth. suppose they are scheduled using weighted fair share sched- 

Tlie closed-loop QoS control attempts to minimize the 65 ulers. Since the schedulers are work conserving, the output 
worst-case deviation of a given stream from its target QoS. cell rate from each input module will be equal to the line rate 
With the assignment of weights according to Eq. (50), each C. The output module buffer level will eventually exceed the 



weights should always be positive, 
follows: 



„,.f. 



we assign them as 



(50) 
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backpressure threshold. The backpressure signal will throttle 
both input modules until the buffer occupancy at the output 
module falls below the stop threshold. The throughput 
received by each stream will be 0.5 C. With weighted fair 
schedulers, it is not possible to achieve different throughputs 
for the two streams. This is because the schedulers are work 
conserving with respect to the first stage bottleneck. 

On the other hand, if DRC schedulers are employed at the 
input modules, the output streams from the input modules 
can be shaped to different rates. For example, suppose 
M a «0.1C and M 2 -0.8C. Then the excess bandwidth at the 
output module bottleneck is 0.1 C. If this excess bandwidth 
is distributed evenly between the two streams, the through- 
puts for the two streams will be R^O.ISC and R 2 =0.85C, 
respectively. 

Hop-by-Hop Dynamic Rate Control 

Dynamic Rate Control can be extended beyond the local 
switch if cells are queued per VC. If the downstream switch 
is able to transmit rate information on a per VC basis, e.g., 
through resource management (RM) cells. The downstream 
rate can be used by the local DRC scheduler to schedule VCs 
as follows: 

R-mm(R locah /?rf OKflI , wm ). 

While the invention has been described with reference to 
specific embodiments thereof, it will be appreciated by those 
skilled in the art that numerous variations, modifications, 
and embodiments are possible, and accordingly, all such 
variations, modifications, and embodiments are to be 
regarded as being within the spirit and scope of the inven- 
tion. 

It should be appreciated that while the above disclosure is 
provided in terms of scheduling cells in an ATM switch, the 
inventive dynamic rate control (DRC) can be implemented 
for scheduling data packets in packet switching. For 
example, equations 39 and 40 can be easily modified to 
account for the packet's length as follows: 

TSrmxxiCZTSt+LfRJn)}, {39') 
TSfTSf+LIR{(n). (40*) 

wherein L represents the length of the packet at the head of 
the queue being scheduled. 
What is claimed is: 

1. A method of rate-based cell scheduling of a plurality of 
cells arriving at an ATM switch having a plurality of queues, 
comprising: 

directing each one of said plurality of cells to a respective 
queue; 

assigning a respective minimum rate guarantee for each of 
said queues; 

assigning a respective excess rate share for each of said 
queues; 

estimating excess bandwidth on a downstream link; 

transmitting said cell from said queues according to the 
respective minimum rate guarantee, while distributing 
the excess bandwidth to said queues according to said 
excess rate share. 

2. A method of rate-based scheduling at an ATM switch 
having a plurality of input queues, comprising the steps of: 

assigning a minimum guaranteed rate for each of said 
queues; 
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computing a dynamic rate for each of said queues; 
shaping each stream arriving at each of said queues 
according to the respective minimum guaranteed rate 
and the respective dynamic rate, 
s 3. The method of claim 2, wherein said ATM switch 
further comprises a plurality of output buffers, said method 
further comprising the steps of: 

monitoring the level of each of said buffers and, when one 
of said buffers reaches a predetermined level, generat- 
10 ing a shape signal identifying said one buffer; 

scheduling said queues according to a rate composed of 
the minimum guaranteed rate plus the dynamic rate 
when said shape signal is not generated and scheduling 
said queues according to said minimum rate when said 
15 shape signal is generated. 

4. A method of rate-based cell scheduling of a plurality of 
cell streams, comprising the steps of: 

assigning a minimum rate for each of said plurality of cell 
streams; 

20 calculating a dynamic rate for each of said cell streams, 
said dynamic rate comprising a product of an assigned 
weight and an estimated excess bandwidth at a down- 
stream bottleneck; and 
adding each dynamic rate to a corresponding minimum 

25 rate. 

5. The method of claim 4, wherein the excess bandwidth 
is estimated via a feedback control loop. 

6. The method of claim 4 wherein said assigned weight is 
static. 

30 7. The method of claim 4 wherein said assigned weight is 
dynamic. 

8. A method for queuing a plurality of virtual channels in 
an ATM switch having a plurality of input buffers, compris- 
ing the steps of: 

35 assigning an input buffer for each of said virtual channels; 
assigning a minimum guaranteed rate for each of said 
buffers; 

assigning a weight for each of said buffers; 
4Q calculating a dynamic rate for each of said buffers, said 
dynamic rate comprising the minimum guaranteed rate 
plus a portion of an unused bandwidth of said switch, 
said portion being proportional to the assigned weight; 
and 

45 shaping transmissions from said buffers according to the 
dynamic rate. 

9. The method of claim 8, wherein each of said buffer is 
assigned to only one virtual channel. 

10. The method of claim 8, wherein each of said buffers 
50 is assigned to a plurality of virtual channels having similar 

quality of service requirements, further comprising the step 
of: 

distributing the dynamic rate of each buffer to its respec- 
tive active virtual channels. 
55 11. The method of claim 10, wherein said dynamic rate is 
distributed evenly among the respective active virtual chan- 
nels. 

12. The method of claim 10, wherein each of said virtual 
channels is assigned a secondary weight and wherein the 

60 dynamic rate is distributed to the respective virtual channels 
according to the respective secondary weight. 

13. A method for controlling overload in a buffer com- 
prising: 

monitoring a load level in said buffer; 
65 when said load level reaches a first threshold, generating 
a shape signal to cause input to said buffer to be 
reduced to a minimum level; 
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when said load level reaches a second threshold, gener- 
ating a stop signal to halt any input to said buffer; 

estimating an unused bandwidth available on said buffer; 
and 

generating a signal indicating said estimate. 
14. A method of rate-based scheduling of a plurality of 
data packets arriving at a switch having a plurality of 
queues, comprising: 

directing each one of said plurality of packets to a 

respective queue; 
assigning a respective minimum rate guarantee for each of 
said queues; 

assigning a respective excess rate share for each of said 
queues; 

estimating excess bandwidth on a downstream link; 
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transmitting said packet from said queues according to the 
respective minimum rate guarantee, while distributing 
the excess bandwidth to said queues according to said 
excess rate share. 

15. A method of rate -based scheduling at a switch having 
a plurality of input queues, comprising the steps of: 

assigning a minimum guaranteed rate for each of said 
queues; 

computing a dynamic rate for each of said queues; 

shaping each packet stream arriving at each of said queues 
according to the respective minimum guaranteed rate 
and the respective dynamic rate. 



15 



06/21/2004, EAST Version: 1.4.1 



